max rank | avg. rank | sentence |
---|---|---|
329 | 85.1111 | Dit gief me in 't Limburgs weer mèt dao. |
333 | 72.1000 | Dao is noe neet al te väöl mie vaan euver. |
335 | 131.4545 | In 'n groet deil vaan de provincie Limburg weurt Limburgs gesproke. |
344 | 107.5000 | Dit is beveurbeeld 't geval in 't Belsj en Nederland. |
374 | 91.2857 | De aw stad ligk ouch aon 't water, in 't noorde vaan 't eiland. |
392 | 135.2000 | Vaan de res vaan häör leve is neet väöl bekind. |
392 | 97.6364 | Vaan 't dörp is ouch noe nog väöl trök te zien. |
394 | 118.0000 | Wie gezag kump 'r neet aon 't begin vaan e woord. |
415 | 100.5000 | Dit vint me veural in 't weste vaan 't land. |
447 | 119.1000 | Limburg weurt verdeild in e Belsj en e Nederlands deil. |
450 | 121.0000 | De lèste vörm heet dus 'n beteikenis op ziech. |
462 | 149.8000 | 't Ingels weurt door väöl lui es twiede taol gesproke. |
466 | 139.9286 | Ouch bij de paar modern hoezer die de plaots kint is dit 't geval. |
504 | 130.3333 | In 't Limburgs gebruuk me dit nog altied zoe. |
522 | 171.1000 | Zie make daorum ouch deil oet vaan de Nederlandse historie. |
568 | 258.8750 | Dat doen dialekte te zuie vaan Rotterdam wel. |
569 | 176.6250 | De naam weurt in fibberwarie 2000 bekind gemaak. |
586 | 346.4286 | Ze moot daorum zier populair zien gewees. |
591 | 113.0000 | Dit dörp is gein deil vaan de stad Rotterdam, al huurt 't wel bij de gemeinte. |
631 | 119.0000 | Heer woort ouch d'n ierste president vaan 't land. |
650 | 175.5000 | Pas tege 't ind vaan d'n iew kaome hei de Franse binne. |
665 | 189.0909 | 't Is neve 't Frans en 't Ingels 'n officieel taol. |
666 | 207.5000 | De bevolking nump evels nog ummer sterk touw. |
668 | 256.4444 | 't Grieks weurt ummer in 't Grieks sjrif gesjreve. |
683 | 251.8750 | In deze Belzje film weurt neve Nederlands en Frans ouch veur e groet deil Limbörgs gesproke. |
690 | 227.5000 | Tot 1 januari 2012 waor 't eige gemeinte. |
702 | 236.1000 | Op 10 oktober 2010 heel 't land op te bestoon. |
710 | 214.0000 | De hoofstad vörmp 'n provincie op ziech. |
711 | 407.1818 | Frankriek ies saer 1 jannewarie 2016 verdeild in 18 regio 's. |
742 | 192.7000 | Ouch in de middeliewe en laank daonao bleef dat zoe. |
The maximum word rank of a sentence is by definition the rank of the rarest word in the sentence. If it is low, all words in the sentence are of high frequency. For this reason the table of the sentences with least maximum word number might be of interest. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The over all distribution of the maximum rank in all sentences of the corpus is shown in a diagram with log-scaled x-axis.
The sentences in the table described above are of interest because they are usually easy to understand. The distribution may give insights into the corpus and may give parameters for language comparison.
While the distribution might be deduced from a small corpus, the sentences in the table are rare and a large corpus will give more impressive results.
Table data:
select max(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m limit 30;
Distribution data;
select m, count(*) from (select 100* round((max(w_id)-100)/100) as m from sentences s, inv_w i where s.s_id=i.s_id and i.w_id>100 group by s.s_id) aa group by m;
Explain the distribution, especially the increase in its right part.
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II